431 Class 18

Thomas E. Love, Ph.D.

2023-10-31

Today’s Agenda

  • One-Factor Analysis of Variance
    • Using Regression to Develop an ANOVA model
    • Methods for pairwise multiple comparisons
  • Two-Factor Analysis of Variance
    • Building a two-way ANOVA model
    • Thinking about interaction
  • Examples Using the ohio_20 data

Today’s Packages

library(readxl) # to read in an .xlsx file
library(ggrepel) # to help label residual plots
library(ggdist) # new package not previously used
library(broom)
library(kableExtra)
library(Hmisc); library(mosaic)
library(janitor)
library(naniar)
library(patchwork)
library(tidyverse)

theme_set(theme_bw())
options(tidyverse.quiet = TRUE)
options(dplyr.summarise.inform = FALSE)

One-Factor Analysis of Variance: Comparing Multiple Means with Independent Samples

Today’s Data (ohio_2020)

ohio_2020.xlsx rows describe Ohio’s 88 counties:

  • FIPS code (identifier for mapping), state and county name
  • health outcomes (standardized: more positive means better outcomes, because we’ve taken the negative of the Z score CHR provides)
  • health behavior ranking (1-88, we’ll divide into 4 groups)
  • clinical care ranking (1-88, we’ll split into 3 groups)
  • proportion of county residents who live in rural areas
  • median income, in dollars
  • proportion of votes in the 2016 Presidential Election for Donald Trump

Importing Data / Creating Factors

ohio20 <- read_xlsx("c18/data/ohio_2020.xlsx") |>
  mutate(behavior = Hmisc::cut2(rk_behavior, g = 4),
         clin_care = Hmisc::cut2(rk_clin_care, g = 3)) |>
  mutate(behavior = fct_recode(behavior,
            "Best" = "[ 1,23)", "High" = "[23,45)",
            "Low" = "[45,67)", "Worst" = "[67,88]")) |>
  mutate(clin_care = fct_recode(clin_care,
            "Strong" = "[ 1,31)", "Middle" = "[31,60)",
            "Weak" = "[60,88]")) |>
  select(FIPS, state, county, outcomes, behavior, clin_care, 
         everything())

A Quick Look at the Data

ohio20 |> filter(county == "Cuyahoga") |>
  select(FIPS, county, outcomes, behavior, clin_care) 
# A tibble: 1 × 5
  FIPS  county   outcomes behavior clin_care
  <chr> <chr>       <dbl> <fct>    <fct>    
1 39035 Cuyahoga   -0.807 Worst    Strong   
ggplot(ohio20, aes(x = "", y = outcomes)) + geom_violin(fill = "orange") +
  geom_boxplot(width = 0.4) + coord_flip() + labs(x = "")

Key Measure Details

  • outcomes = quantity that describes the county’s premature death and quality of life results, weighted equally and standardized (z scores).
    • Higher (more positive) values indicate better outcomes in this county.

Key Measure Details

  • behavior = (Best/High/Low/Worst) reflecting adult smoking, obesity, food environment, inactivity, exercise, drinking, alcohol-related driving deaths, sexually transmitted infections and teen births.
    • Counties in the Best group had the best behavior results.

Key Measure Details

  • clin_care = (Strong/Middle/Weak) reflects rates of uninsured, care providers, preventable hospital stays, diabetes monitoring and mammography screening.
    • Strong means that clinical care is strong in this county.

Today’s First Question

  1. How do average health outcomes vary across groups of counties defined by health behavior?

K Samples: Comparing Means

  1. What is the outcome under study?
  2. What are the (in this case, \(K \geq 2\)) treatment/exposure groups?
  3. Were the data in fact collected using independent samples?
  4. Are the data random samples from the population(s) of interest? Or is there at least a reasonable argument for generalizing from the samples to the population(s)?
  5. What is the significance level (or, the confidence level) we require?
  6. Are we doing one-sided or two-sided testing? (usually 2-sided)
  7. What does the distribution of each individual sample tell us about which inferential procedure to use?
  8. Are there statistically detectable differences between population means?
  9. If an overall test rejects the null, can we identify pairwise comparisons of means that show detectable differences using an appropriate procedure that protects against Type I error expansion due to multiple comparisons?

Question 1

Do average health outcomes differ by health behavior?

ggplot(ohio20, aes(x = behavior, y = outcomes, 
                   fill = behavior)) +
  geom_violin(alpha = 0.25) +
  geom_boxplot(width = 0.25) +
  guides(fill = "none") + 
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  labs(x = "Health Behavior Group", 
       y = "Health Outcomes (higher = better health)",
       title = "Health Outcomes across Behavior Groups",
       subtitle = "Ohio's 88 counties, 2020 County Health Rankings",
       caption = "Source: https://www.countyhealthrankings.org/app/ohio/2020/downloads")

Question 1

Question 1 Raindrop Plots?

ggplot(ohio20, aes(x = behavior, y = outcomes, 
                   fill = behavior)) +
  ggdist::stat_halfeye(adjust = 0.5, width = 0.3, .width = c(0.5, 1)) +
  ggdist::stat_dots(side = "left", dotsize = 1, justification = 1.05, binwidth = 0.1) +
  guides(fill = "none") + 
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  labs(x = "Health Behavior Group", 
       y = "Health Outcomes (higher = better health)",
       title = "Health Outcomes across Behavior Groups",
       subtitle = "Ohio's 88 counties, 2020 County Health Rankings",
       caption = "Source: https://www.countyhealthrankings.org/app/ohio/2020/downloads")

Question 1 Raindrop Plots?

Question 1 Numerical Summaries

How do average health outcomes vary across groups of counties defined by health behavior?

mosaic::favstats(outcomes ~ behavior, data = ohio20) |>
  rename(na = missing) |> kbl(digits = 2) |> kable_classic_2(font_size = 28)
behavior min Q1 median Q3 max mean sd n na
Best -0.33 0.60 0.86 1.46 2.17 0.96 0.57 22 0
High -0.35 0.00 0.30 0.55 0.77 0.25 0.35 22 0
Low -1.15 -0.52 -0.09 0.16 0.73 -0.18 0.47 22 0
Worst -2.05 -1.75 -0.87 -0.59 -0.08 -1.04 0.63 22 0

Note that there is no missing data here.

Analysis of Variance: Question 1

Does the mean outcomes result differ detectably across the behavior groups?

\[ H_0: \mu_{Best} = \mu_{High} = \mu_{Low} = \mu_{Worst} \mbox{ vs. } \\ H_A: \mbox{At least one } \mu \mbox{ is different.} \]

To test this set of hypotheses, we will build a linear model to predict each county’s outcome based on what behavior group the county is in.

Building the Linear Model

Can we detect differences in population means of outcomes across behavior groups, with \(\alpha = 0.10\)?

model_one <- lm(outcomes ~ behavior, data = ohio20)
tidy(model_one, conf.int= TRUE, conf.level = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 3) |> kable_classic_2(font_size = 28)
term estimate std.error conf.low conf.high p.value
(Intercept) 0.965 0.110 0.781 1.148 0
behaviorHigh -0.711 0.156 -0.971 -0.451 0
behaviorLow -1.142 0.156 -1.402 -0.883 0
behaviorWorst -2.006 0.156 -2.265 -1.746 0

How do we interpret this result?

Meaning of indicator variables?

outcomes = 0.96 - 0.71 behaviorHigh 
           - 1.14 behaviorLow - 2.01 behaviorWorst
group behaviorHigh behaviorLow behaviorWorst
Best 0 0 0
High 1 0 0
Low 0 1 0
Worst 0 0 1
  • So what is the predicted outcomes score for a county in the High behavior group, according to this model?

Interpreting the Indicator Variables

outcomes = 0.96 - 0.71 behaviorHigh 
           - 1.14 behaviorLow - 2.01 behaviorWorst

What predictions does the model make? Do these make sense?

group High Low Worst Prediction
Best 0 0 0 0.96
High 1 0 0 0.96 - 0.71 = 0.25
Low 0 1 0 0.96 - 1.14 = -0.18
Worst 0 0 1 0.96 - 2.01 = -1.05

Interpreting the Indicator Variables

outcomes = 0.96 - 0.71 behaviorHigh 
           - 1.14 behaviorLow - 2.01 behaviorWorst
ohio20 |> group_by(behavior) |>
  summarise(n = n(), mean = round_half_up(mean(outcomes),2)) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 28, full_width = F)
behavior n mean
Best 22 0.96
High 22 0.25
Low 22 -0.18
Worst 22 -1.04

ANOVA for Linear Model

Are there detectable differences in mean outcome across the behavior groups?

\[ H_0: \mu_{Best} = \mu_{High} = \mu_{Low} = \mu_{Worst} \mbox{ vs. } \\ H_A: \mbox{At least one } \mu \mbox{ is different.} \]

anova(model_one)
Analysis of Variance Table

Response: outcomes
          Df Sum Sq Mean Sq F value    Pr(>F)    
behavior   3 46.421 15.4736  57.718 < 2.2e-16 ***
Residuals 84 22.519  0.2681                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So, what’s in the ANOVA table? (df)

The ANOVA table reports here on a single factor (behavior group) with 4 levels, and on the residual variation in health outcomes.

anova(model_one)[1:2]
          Df Sum Sq
behavior   3 46.421
Residuals 84 22.519

Degrees of Freedom (df) is an index of sample size…

  • df for our factor (behavior) is one less than the number of categories. We have four behavior groups, so 3 degrees of freedom.
  • Adding df(behavior) + df(Residuals) = 3 + 84 = 87 = df(Total), one less than the number of observations (counties) in Ohio.
  • n observations and g groups yield \(n - g\) residual df in a one-factor ANOVA table.

ANOVA table: Sum of Squares

anova(model_one)[1:3]
          Df Sum Sq Mean Sq
behavior   3 46.421 15.4736
Residuals 84 22.519  0.2681

Sum of Squares (Sum Sq, or SS) is an index of variation…

  • SS(factor), here SS(behavior) measures the amount of variation accounted for by the behavior groups in our model_one.
  • The total variation in outcomes to be explained by the model is SS(factor) + SS(Residuals) = SS(Total) in a one-factor ANOVA table.
  • We describe the proportion of variation explained by a one-factor ANOVA model with \(\eta^2\) (“eta-squared”: same as Multiple \(R^2\))

\[ \eta^2 = \frac{SS(\mbox{behavior})}{SS(\mbox{Total})} = \frac{46.421}{46.421+22.519} = \frac{46.421}{68.94} \approx 0.673 \]

ANOVA table: (Mean Square, F ratio)

anova(model_one)[1:4]
          Df Sum Sq Mean Sq F value
behavior   3 46.421 15.4736  57.718
Residuals 84 22.519  0.2681        

Mean Square (Mean Sq, or MS) = Sum of Squares / df

\[ MS(\mbox{behavior}) = \frac{SS(\mbox{behavior})}{df(\mbox{behavior})} = \frac{46.421}{3} \approx 15.4736 \]

  • MS(Residuals) estimates the residual variance, the square of the residual standard deviation (residual standard error in earlier work).
  • The ratio of MS values is the ANOVA F value.

\[ {\mbox{ANOVA }} F = \frac{MS(\mbox{behavior})}{MS(\mbox{Residuals})} = \frac{15.4736}{0.2681} \approx 57.718 \]

ANOVA Table p value

tidy(anova(model_one)) |> kbl(digits = 3) |> 
  kable_classic_2(font_size = 28, full_width = F)
term df sumsq meansq statistic p.value
behavior 3 46.421 15.474 57.718 0
Residuals 84 22.519 0.268 NA NA
  • The p value is derived from the ANOVA F statistic, as compared to the F distribution.
  • Which F distribution is specified by the two degrees of freedom values…
pf(57.718, df1 = 3, df2 = 84, lower.tail = FALSE)
[1] 2.377323e-20

Alternative ANOVA displays

glance(model_one) |> select(r.squared, statistic, df, df.residual, p.value) |>
  kbl() |> kable_minimal(font_size = 24, full_width = FALSE)
r.squared statistic df df.residual p.value
0.673348 57.71815 3 84 0

or

summary(aov(model_one))
            Df Sum Sq Mean Sq F value Pr(>F)    
behavior     3  46.42  15.474   57.72 <2e-16 ***
Residuals   84  22.52   0.268                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So, what’s the conclusion? Is this a surprise?

Multiple Comparisons

What’s Left? (Multiple Comparisons)

  1. If an overall test rejects the null, can we identify pairwise comparisons of means that show detectable differences using an appropriate procedure that protects against Type I error expansion due to multiple comparisons?

Yes.

Two Methods for Multiple Comparisons

There are two methods we’ll study to identify specific pairs of means where we have statistically detectable differences, while dealing with the problem of multiple comparisons.

  • Holm-Bonferroni pairwise comparisons
  • Tukey’s HSD (Honestly Significant Differences) approach

Compare behavior group means of outcomes?

ANOVA tells is that there is strong evidence that they aren’t all the same. Which ones are different from which?

anova(lm(outcomes ~ behavior, data = ohio20))
Analysis of Variance Table

Response: outcomes
          Df Sum Sq Mean Sq F value    Pr(>F)    
behavior   3 46.421 15.4736  57.718 < 2.2e-16 ***
Residuals 84 22.519  0.2681                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Is, for example, Best detectably different from Worst?

Could we just run a bunch of t tests?

This approach assumes that you need to make no adjustment for the fact that you are doing multiple comparisons, simultaneously.

pairwise.t.test(ohio20$outcomes, ohio20$behavior, 
                p.adjust.method = "none")

    Pairwise comparisons using t tests with pooled SD 

data:  ohio20$outcomes and ohio20$behavior 

      Best    High    Low    
High  1.8e-05 -       -      
Low   1.4e-10 0.007   -      
Worst < 2e-16 1.6e-12 3.6e-07

P value adjustment method: none 

The problem of Multiple Comparisons

  • The more comparisons you do simultaneously, the more likely you are to make an error.

In the worst case scenario, suppose you do two tests - first A vs. B and then A vs. C, each at the \(\alpha = 0.10\) level.

  • What is the combined error rate across those two t tests?

The problem of Multiple Comparisons

Run the first test. Make a Type I error 10% of the time.

A vs B Type I error Probability
Yes 0.1
No 0.9

Now, run the second test. Assume (perhaps wrongly) that comparing A to C is independent of your A-B test result. What is the error rate now?

The problem of Multiple Comparisons

Assuming there is a 10% chance of making an error in either test, independently …

Error in A vs. C No Error Total
Type I error in A vs. B 0.01 0.09 0.10
No Type I error in A-B 0.09 0.81 0.90
Total 0.10 0.90 1.00

So you will make an error in the A-B or A-C comparison 19% of the time, rather than the nominal \(\alpha = 0.10\) error rate.

But we’re building SIX tests

  1. Best vs. High
  2. Best vs. Low
  3. Best vs. Worst
  4. High vs. Low
  5. High vs. Worst
  6. Low vs. Worst

and if they were independent, and each done at a 5% error rate, we could still wind up with an error rate of

\(.05 + (.95)(.05) + (.95)(.95)(.05) + (.95)^3(.05) + (.95)^4(.05) + (.95)^5(.05)\) = .265

Or worse, if they’re not independent.

The Bonferroni Method

If we do 6 tests, we could reduce the necessary \(\alpha\) to 0.05 / 6 = 0.0083 and that maintains an error rate no higher than \(\alpha = 0.05\) across the 6 tests.

  • Or, R can adjust the p values directly…
pairwise.t.test(ohio20$outcomes, ohio20$behavior, 
                p.adjust.method = "bonferroni")

    Pairwise comparisons using t tests with pooled SD 

data:  ohio20$outcomes and ohio20$behavior 

      Best    High    Low    
High  0.00011 -       -      
Low   8.3e-10 0.04224 -      
Worst < 2e-16 9.4e-12 2.1e-06

P value adjustment method: bonferroni 

We still detect a meaningful difference between each pair of groups.

Better Approach: Holm-Bonferroni

Suppose you have \(m\) comparisons, with p-values sorted from low to high as \(p_1\), \(p_2\), …, \(p_m\).

  • Is \(p_1 < \alpha/m\)? If so, reject \(H_1\) and continue, otherwise STOP.
  • Is \(p_2 < \alpha/(m-1)\)? If so, reject \(H_2\) and continue, else STOP.
  • and so on…

Holm-Bonferroni Approach

This is uniformly more powerful than Bonferroni, while preserving the overall false positive rate at \(\alpha\).

pairwise.t.test(ohio20$outcomes, ohio20$behavior, 
                p.adjust.method = "holm")

    Pairwise comparisons using t tests with pooled SD 

data:  ohio20$outcomes and ohio20$behavior 

      Best    High    Low    
High  3.5e-05 -       -      
Low   5.5e-10 0.007   -      
Worst < 2e-16 7.9e-12 1.1e-06

P value adjustment method: holm 

Tukey’s Honestly Significant Differences

Tukey’s HSD approach is a better choice for pre-planned comparisons with a balanced (or nearly balanced) design. It provides confidence intervals and an adjusted p value for each comparison.

  • Let’s run some confidence intervals to yield an overall 99% confidence level, even with 6 tests…
TukeyHSD(aov(lm(outcomes ~ behavior, data = ohio20)), 
         conf.level = 0.99, ordered = TRUE)

Tukey’s Honestly Significant Differences

  Tukey multiple comparisons of means
    99% family-wise confidence level
    factor levels have been ordered

Fit: aov(formula = lm(outcomes ~ behavior, data = ohio20))

$behavior
                diff         lwr       upr     p adj
Low-Worst  0.8632211  0.36223069 1.3642115 0.0000021
High-Worst 1.2945256  0.79353515 1.7955159 0.0000000
Best-Worst 2.0056105  1.50462011 2.5066009 0.0000000
High-Low   0.4313045 -0.06968593 0.9322949 0.0348350
Best-Low   1.1423894  0.64139903 1.6433798 0.0000000
Best-High  0.7110850  0.21009456 1.2120753 0.0001023

Tidying Tukey HSD 99% CIs

model_one <- lm(outcomes ~ behavior, data = ohio20)
tukey_one <- tidy(TukeyHSD(aov(model_one), ordered = TRUE, conf.level = 0.99))
tukey_one |> rename(null = null.value) |> 
  kbl(digits = 3) |> kable_classic_2(font_size = 28)
term contrast null estimate conf.low conf.high adj.p.value
behavior Low-Worst 0 0.863 0.362 1.364 0.000
behavior High-Worst 0 1.295 0.794 1.796 0.000
behavior Best-Worst 0 2.006 1.505 2.507 0.000
behavior High-Low 0 0.431 -0.070 0.932 0.035
behavior Best-Low 0 1.142 0.641 1.643 0.000
behavior Best-High 0 0.711 0.210 1.212 0.000

Plot Tukey HSD intervals

ggplot(tukey_one, aes(x = reorder(contrast, -estimate), 
                      y = estimate)) +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + 
  geom_hline(yintercept = 0, col = "red", 
             linetype = "dashed") +
  geom_text(aes(label = round(estimate,2)), nudge_x = -0.2) +
  labs(x = "Contrast between Behavior Groups", 
       y = "Estimated Effect, with 99% Tukey HSD interval",
       title = "Estimated Effects, with Tukey HSD 99% Confidence Intervals",
       subtitle = "Comparing Outcomes by Behavior Group, ohio20 data")

Plot Tukey HSD intervals

ANOVA Assumptions

The assumptions behind analysis of variance are those of a linear model. Of specific interest are:

  • The samples obtained from each group are independent.
  • Ideally, the samples from each group are a random sample from the population described by that group.
  • In the population, the variance of the outcome in each group is equal. (This is less of an issue if our study involves a balanced design.)
  • In the population, we have Normal distributions of the outcome in each group.

Happily, the ANOVA F test is fairly robust to violations of the Normality assumption.

Residual Plots for model_one

aug_one <- augment(model_one, ohio20)

p1 <- ggplot(aug_one, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = F,
              lty = "dashed", col = "red") +
  geom_text_repel(data = aug_one |> 
                    slice_max(abs(.resid), n = 3), 
                  aes(label = county)) +
  labs(title = "model_one Residuals vs. Fitted",
       x = "Fitted Value from model_one",
       y = "Residuals from model_one")

p2 <- ggplot(aug_one, aes(sample = .resid)) +
  geom_qq() + geom_qq_line(col = "red") + 
  labs(title = "model_one Residuals",
       y = "")

p3 <- ggplot(aug_one, aes(y = .resid, x = "")) +
  geom_violin(fill = "aquamarine") +
  geom_boxplot(width = 0.5) + 
  labs(y = "", x = "")

p1 + p2 + p3 + plot_layout(widths = c(5, 4, 1))

Residual Plots for model_one

Can we avoid assuming equal population variances?

Yes, but this isn’t exciting if we have a balanced design.

oneway.test(outcomes ~ behavior, data = ohio20)

    One-way analysis of means (not assuming equal variances)

data:  outcomes and behavior
F = 43.145, num df = 3.000, denom df = 45.494, p-value = 2.349e-13
  • Note that this approach uses a fractional degrees of freedom calculation in the denominator.

The Kruskal-Wallis Test

If you thought the data were severely skewed, you might try:

kruskal.test(outcomes ~ behavior, data = ohio20)

    Kruskal-Wallis rank sum test

data:  outcomes by behavior
Kruskal-Wallis chi-squared = 61.596, df = 3, p-value = 2.681e-13
  • \(H_0\): The four behavior groups have the same center to their outcomes distributions.
  • \(H_A\): At least one group has a shifted distribution, with a different center to its outcomes.

What would be the conclusion here?

K Samples: Comparing Means

  1. What is the outcome under study?
  2. What are the (in this case, \(K \geq 2\)) treatment/exposure groups?
  3. Were the data in fact collected using independent samples?
  4. Are the data random samples from the population(s) of interest? Or is there at least a reasonable argument for generalizing from the samples to the population(s)?
  5. What is the significance level (or, the confidence level) we require?
  6. Are we doing one-sided or two-sided testing? (usually 2-sided)
  7. What does the distribution of each individual sample tell us about which inferential procedure to use?
  8. Are there statistically detectable differences between population means?
  9. If an overall test rejects the null, can we identify pairwise comparisons of means that show detectable differences using an appropriate procedure that protects against Type I error expansion due to multiple comparisons?

Two-Factor Analysis of Variance

A Two-Factor Example

Suppose we want to simultaneously understand the impacts of two factors on our standardized health outcomes?

  • health behavior ranking (divided into 4 groups)
  • clinical care ranking (divided into 3 groups)

and it is possible that the impact of the health behavior ranking on our outcome measure may depend on the clinical care ranking, and vice versa (i.e. the factors may interact.)

Interaction Plot is a plot of means

Calculate the group means.

out_summary <- ohio20 |>
  group_by(behavior, clin_care) |>
  summarise(mean_out = mean(outcomes, na.rm = TRUE))

out_summary
# A tibble: 12 × 3
# Groups:   behavior [4]
   behavior clin_care mean_out
   <fct>    <fct>        <dbl>
 1 Best     Strong       1.14 
 2 Best     Middle       0.751
 3 Best     Weak         0.635
 4 High     Strong       0.205
 5 High     Middle       0.197
 6 High     Weak         0.397
 7 Low      Strong      -0.217
 8 Low      Middle      -0.131
 9 Low      Weak        -0.192
10 Worst    Strong      -1.11 
11 Worst    Middle      -0.987
12 Worst    Weak        -1.04 

Now, build the interaction plot.

Looking for a substantial interaction (non-parallel lines)

ggplot(out_summary, aes(x = behavior, y = mean_out)) +
  geom_line(aes(group = clin_care, color = clin_care)) +
  geom_point(aes(color = clin_care))

Now, build the interaction plot.

Two-Way ANOVA without Interaction

model_noint <- lm(outcomes ~ behavior + clin_care, data = ohio20)

model_noint

Call:
lm(formula = outcomes ~ behavior + clin_care, data = ohio20)

Coefficients:
    (Intercept)     behaviorHigh      behaviorLow    behaviorWorst  
        0.99885         -0.68556         -1.12261         -1.98180  
clin_careMiddle    clin_careWeak  
       -0.09405         -0.06179  

Two-Way ANOVA without Interaction

tidy(model_noint, conf.int= TRUE, conf.level = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 28)
term estimate std.error conf.low conf.high p.value
(Intercept) 1.00 0.12 0.79 1.20 0.00
behaviorHigh -0.69 0.16 -0.96 -0.42 0.00
behaviorLow -1.12 0.16 -1.39 -0.85 0.00
behaviorWorst -1.98 0.17 -2.26 -1.70 0.00
clin_careMiddle -0.09 0.14 -0.33 0.14 0.50
clin_careWeak -0.06 0.15 -0.31 0.18 0.67

ANOVA setup for No Interaction model

anova(model_noint)
Analysis of Variance Table

Response: outcomes
          Df Sum Sq Mean Sq F value Pr(>F)    
behavior   3 46.421 15.4736 56.6609 <2e-16 ***
clin_care  2  0.126  0.0630  0.2307 0.7945    
Residuals 82 22.393  0.2731                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Proportion of total SS explained by the two factors?
    • (46.421 + 0.126) / (46.421 + 0.126 + 22.393) = 46.541 / 68.941 = 0.675

What if we flipped the order?

anova(lm(outcomes ~ clin_care + behavior, data = ohio20))
Analysis of Variance Table

Response: outcomes
          Df Sum Sq Mean Sq F value    Pr(>F)    
clin_care  2  7.232  3.6159  13.241  1.04e-05 ***
behavior   3 39.315 13.1050  47.988 < 2.2e-16 ***
Residuals 82 22.393  0.2731                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Note that SS(Residuals) is unchanged from previous slide.

Residual Plots?

par(mfrow = c(1,2)); plot(model_noint); par(mfrow = c(1,1))

Two-Way ANOVA with Interaction

model_inter <- lm(outcomes ~ behavior * clin_care, data = ohio20)
model_inter

Call:
lm(formula = outcomes ~ behavior * clin_care, data = ohio20)

Coefficients:
                  (Intercept)                   behaviorHigh  
                       1.1393                        -0.9341  
                  behaviorLow                  behaviorWorst  
                      -1.3566                        -2.2532  
              clin_careMiddle                  clin_careWeak  
                      -0.3880                        -0.5041  
 behaviorHigh:clin_careMiddle    behaviorLow:clin_careMiddle  
                       0.3794                         0.4744  
behaviorWorst:clin_careMiddle     behaviorHigh:clin_careWeak  
                       0.5149                         0.6959  
    behaviorLow:clin_careWeak    behaviorWorst:clin_careWeak  
                       0.5298                         0.5789  

ANOVA setup for Interaction model

anova(model_inter)
Analysis of Variance Table

Response: outcomes
                   Df Sum Sq Mean Sq F value Pr(>F)    
behavior            3 46.421 15.4736 55.2403 <2e-16 ***
clin_care           2  0.126  0.0630  0.2249 0.7991    
behavior:clin_care  6  1.105  0.1841  0.6574 0.6841    
Residuals          76 21.289  0.2801                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • We check the interaction first, when dealing with these models.
  • Proportion of total SS explained by the interaction?
    • 1.105 / (7.232 + 39.315 + 1.105 + 21.289) = 1.105 / 68.941 = 0.016

What if we flipped the order?

anova(lm(outcomes ~ clin_care * behavior, data = ohio20))
Analysis of Variance Table

Response: outcomes
                   Df Sum Sq Mean Sq F value    Pr(>F)    
clin_care           2  7.232  3.6159 12.9086 1.492e-05 ***
behavior            3 39.315 13.1050 46.7845 < 2.2e-16 ***
clin_care:behavior  6  1.105  0.1841  0.6574    0.6841    
Residuals          76 21.289  0.2801                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Remember to check the interaction first.

Two-Way ANOVA with Interaction

tidy(model_inter, conf.int= TRUE, conf.level = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 18)
term estimate std.error conf.low conf.high p.value
(Intercept) 1.14 0.15 0.89 1.38 0.00
behaviorHigh -0.93 0.26 -1.37 -0.50 0.00
behaviorLow -1.36 0.25 -1.77 -0.94 0.00
behaviorWorst -2.25 0.30 -2.76 -1.75 0.00
clin_careMiddle -0.39 0.26 -0.82 0.05 0.14
clin_careWeak -0.50 0.34 -1.07 0.06 0.14
behaviorHigh:clin_careMiddle 0.38 0.38 -0.25 1.01 0.32
behaviorLow:clin_careMiddle 0.47 0.38 -0.16 1.10 0.21
behaviorWorst:clin_careMiddle 0.51 0.44 -0.22 1.25 0.25
behaviorHigh:clin_careWeak 0.70 0.46 -0.06 1.46 0.13
behaviorLow:clin_careWeak 0.53 0.44 -0.21 1.27 0.23
behaviorWorst:clin_careWeak 0.58 0.45 -0.18 1.34 0.21

Residual Plots (with interaction)?

par(mfrow = c(1,2)); plot(model_inter); par(mfrow = c(1,1))

What conclusions should we draw?

  • The interaction term only accounts for a small percentage of the variation in our outcome.
  • The interaction term also only accounts for a small percentage of the variation we explain with our model.
  • The interaction plot suggests no substantial interaction between the factors (lines are essentially parallel.)

So, I would probably prefer the “no interaction” model.

Remaining Slides are for Self-Study

The remaining slides provide

  • three more one-factor examples (using the same data) designed for self-study, and

  • one more two-factor example (this time where interaction matters) also designed for self-study.

Use these and the example in Chapter 25 of the Course Notes (one-factor) to guide your work.

Self-Study Examples (not discussed in class)

One-Way ANOVA Question 2

Do groups of counties defined by clinical care show meaningful differences in average health outcomes?

ggplot(ohio20, aes(x = clin_care, y = outcomes, 
                   fill = clin_care)) +
  geom_violin(alpha = 0.5) +
  geom_boxplot(width = 0.25, notch = TRUE, 
               col = c("white", "black", "black")) +
  guides(fill = "none") + 
  scale_fill_viridis_d(option = "C") +
  labs(x = "Clinical Care Ranking (groups)", 
       y = "Health Outcomes (higher = better health)",
       title = "Health Outcomes across County Clinical Care Ranking",
       subtitle = "Ohio's 88 counties, 2020 County Health Rankings",
       caption = "Source: https://www.countyhealthrankings.org/app/ohio/2020/downloads")

One-Way ANOVA Question 2

Question 2 Numerical Summaries

Do groups of counties defined by clinical care show meaningful differences in average health outcomes?

mosaic::favstats(outcomes ~ clin_care, data = ohio20) |>
  rename(na = missing) |> kbl(digits = 2) |> kable_classic_2(font_size = 28)
clin_care min Q1 median Q3 max mean sd n na
Strong -1.90 -0.23 0.44 0.88 2.17 0.34 0.94 30 0
Middle -1.77 -0.35 0.07 0.47 1.42 0.02 0.69 29 0
Weak -2.05 -0.76 -0.33 0.21 1.68 -0.36 0.90 29 0

Question 2 Analysis of Variance

model_two <- lm(outcomes ~ clin_care, data = ohio20)

anova(model_two)
Analysis of Variance Table

Response: outcomes
          Df Sum Sq Mean Sq F value   Pr(>F)   
clin_care  2  7.232  3.6159  4.9807 0.009007 **
Residuals 85 61.708  0.7260                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual Plots for model_two

aug_two <- augment(model_two, ohio20)

p1 <- ggplot(aug_two, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = F,
              lty = "dashed", col = "red") +
  geom_text_repel(data = aug_two |> 
                    slice_max(abs(.resid), n = 3), 
                  aes(label = county)) +
  labs(title = "model_two Residuals vs. Fitted",
       x = "Fitted Value from model_two",
       y = "Residuals from model_two")

p2 <- ggplot(aug_two, aes(sample = .resid)) +
  geom_qq() + geom_qq_line(col = "red") + 
  labs(title = "model_two Residuals",
       y = "")

p3 <- ggplot(aug_two, aes(y = .resid, x = "")) +
  geom_violin(fill = "aquamarine") +
  geom_boxplot(width = 0.5) + 
  labs(y = "", x = "")

p1 + p2 + p3 + plot_layout(widths = c(5, 4, 1))

Residual Plots for model_two

Question 2 Kruskal-Wallis test

kruskal.test(outcomes ~ clin_care, data = ohio20)

    Kruskal-Wallis rank sum test

data:  outcomes by clin_care
Kruskal-Wallis chi-squared = 8.3139, df = 2, p-value = 0.01566

K Samples: Comparing Means

  1. What is the outcome under study?
  2. What are the (in this case, \(K \geq 2\)) treatment/exposure groups?
  3. Were the data in fact collected using independent samples?
  4. Are the data random samples from the population(s) of interest? Or is there at least a reasonable argument for generalizing from the samples to the population(s)?
  5. What is the significance level (or, the confidence level) we require?
  6. Are we doing one-sided or two-sided testing? (usually 2-sided)
  7. What does the distribution of each individual sample tell us about which inferential procedure to use?
  8. Are there statistically meaningful differences between population means?
  9. If an overall test rejects the null, can we identify pairwise comparisons of means that show detectable differences using an appropriate procedure that protects against Type I error expansion due to multiple comparisons?

Question 2: 90% Tukey HSD intervals, tidying

model_two <- lm(outcomes ~ clin_care, data = ohio20)
tukey_two <- tidy(TukeyHSD(aov(model_two), 
                           ordered = TRUE, 
                           conf.level = 0.90))
tukey_two |> select(-term, -null.value) |> kbl(digits = 3) |> kable_classic_2()
contrast estimate conf.low conf.high adj.p.value
Middle-Weak 0.381 -0.084 0.847 0.210
Strong-Weak 0.700 0.238 1.161 0.006
Strong-Middle 0.319 -0.143 0.780 0.327

Plotting Question 2 Tukey HSD intervals

ggplot(tukey_two, aes(x = reorder(contrast, -estimate), 
                      y = estimate)) +
  geom_crossbar(aes(ymin = conf.low, ymax = conf.high), 
                fatten = 1) + 
  geom_hline(yintercept = 0, col = "red", 
             linetype = "dashed") +
  geom_text(aes(label = round(estimate,2)), nudge_y = 0.1) +
  labs(x = "Contrast between Clinical Care Groups", 
       y = "Estimated Effect, with 90% Tukey HSD interval",
       title = "Estimated Effects, with Tukey HSD 90% Confidence Intervals",
       subtitle = "Comparing Outcomes by Clinical Care Group, ohio20 data")

Plotting Question 2 Tukey HSD intervals

One-Way ANOVA Question 3 (Education)

We have some additional variables in ohio20, specifically:

  • trump16 = proportion of the vote cast in 2016 in the county that went to Former President Trump
  • somecollege = percentage of adults ages 25-44 with some post-secondary education in the county

Question 3 (Education)

Let’s break Ohio’s counties into 5 groups based on somecollege

ohio20 <- ohio20 |> 
  mutate(trump16 = 100*trump16) |>
  mutate(educ = Hmisc::cut2(somecollege, g = 5)) |>
  mutate(educ = fct_recode(educ, "Least" = "[20.4,50.3)", 
          "Low" = "[50.3,54.3)", "Middle" = "[54.3,59.7)", 
          "High" = "[59.7,67.1)", "Most" = "[67.1,85.1]"))

Did Former President Trump’s vote percentage in 2016 vary meaningfully across groups of counties defined by educational attainment?

Trump 2016 % by Educational Attainment

ggplot(ohio20, aes(x = educ, y = trump16, fill = educ)) +
  geom_violin(alpha = 0.25) +
  geom_boxplot(width = 0.25) +
  guides(fill = "none") + 
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  labs(x = "Education Group (2020 County Health Rankings)", 
       y = "Proportion of Vote for Trump in 2016 Election",
       title = "Proportion of Trump Vote by 'Some College' Group",
       subtitle = "Ohio's 88 counties")

Trump 2016 % by Educational Attainment

Numerical Comparison

mosaic::favstats(trump16 ~ educ, data = ohio20) |>
  rename(na = missing) |> kbl(digits = 2) |> kable_classic_2(font_size = 28)
educ min Q1 median Q3 max mean sd n na
Least 57.06 67.64 70.67 73.44 78.89 70.34 5.06 18 0
Low 51.05 65.57 69.06 72.16 78.53 68.72 6.17 18 0
Middle 57.31 65.58 68.75 71.14 76.03 68.39 4.89 17 0
High 38.32 52.34 61.72 67.78 80.58 60.42 12.83 18 0
Most 30.51 47.97 56.95 60.78 79.72 55.08 12.51 17 0

Analysis of Variance: Question 3

Does the mean trump16 result differ detectably across the educ groups?

model_3 <- lm(trump16 ~ educ, data = ohio20)

tidy(model_3, conf.int = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 28)
term estimate std.error conf.low conf.high p.value
(Intercept) 70.34 2.13 66.11 74.58 0.00
educLow -1.62 3.01 -7.61 4.37 0.59
educMiddle -1.95 3.05 -8.02 4.13 0.52
educHigh -9.92 3.01 -15.91 -3.93 0.00
educMost -15.26 3.05 -21.33 -9.18 0.00

ANOVA: Question 3

anova(model_3)
Analysis of Variance Table

Response: trump16
          Df Sum Sq Mean Sq F value    Pr(>F)    
educ       4 2997.1  749.27  9.1867 3.401e-06 ***
Residuals 83 6769.5   81.56                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
glance(model_3) |> 
  select(r.squared, statistic, df, df.residual, p.value)
# A tibble: 1 × 5
  r.squared statistic    df df.residual    p.value
      <dbl>     <dbl> <dbl>       <int>      <dbl>
1     0.307      9.19     4          83 0.00000340

So, what’s the conclusion?

Residual Plots for model_3

aug_3 <- augment(model_3, ohio20)

p1 <- ggplot(aug_3, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = F,
              lty = "dashed", col = "red") +
  geom_text_repel(data = aug_3 |> 
                    slice_max(abs(.resid), n = 3), 
                  aes(label = county)) +
  labs(title = "model_3 Residuals vs. Fitted",
       x = "Fitted Value from model_3",
       y = "Residuals from model_3")

p2 <- ggplot(aug_3, aes(sample = .resid)) +
  geom_qq() + geom_qq_line(col = "red") + 
  labs(title = "model_3 Residuals",
       y = "")

p3 <- ggplot(aug_3, aes(y = .resid, x = "")) +
  geom_violin(fill = "aquamarine") +
  geom_boxplot(width = 0.5) + 
  labs(y = "", x = "")

p1 + p2 + p3 + plot_layout(widths = c(5, 4, 1))

Residual Plots for model_3

Does Kruskal-Wallis give a very different result?

kruskal.test(trump16 ~ educ, data = ohio20)

    Kruskal-Wallis rank sum test

data:  trump16 by educ
Kruskal-Wallis chi-squared = 25.759, df = 4, p-value = 3.539e-05

Tukey HSD 90% CIs: Example 3

tukey_3 <- tidy(TukeyHSD(aov(model_3), ordered = TRUE, conf.level = 0.90))
tukey_3 |> select(-null.value) |> 
  kbl(digits = 3) |> kable_classic_2(font_size = 28)
term contrast estimate conf.low conf.high adj.p.value
educ High-Most 5.340 -2.302 12.982 0.411
educ Middle-Most 13.309 5.559 21.060 0.000
educ Low-Most 13.638 5.995 21.280 0.000
educ Least-Most 15.259 7.617 22.901 0.000
educ Middle-High 7.969 0.327 15.611 0.078
educ Low-High 8.297 0.765 15.829 0.054
educ Least-High 9.919 2.387 17.451 0.012
educ Low-Middle 0.328 -7.314 7.970 1.000
educ Least-Middle 1.950 -5.692 9.592 0.968
educ Least-Low 1.622 -5.911 9.154 0.983

Plotting Tukey HSD intervals

ggplot(tukey_3, aes(x = reorder(contrast, -estimate), 
                      y = estimate)) +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + 
  geom_hline(yintercept = 0, col = "red", 
             linetype = "dashed") +
  geom_label(aes(label = round_half_up(estimate,1))) +
  coord_flip() +
  labs(x = "Contrast between Education Groups", 
       y = "Estimated Effect, with 90% Tukey HSD interval",
       title = "Estimated Effects, with Tukey HSD 90% Confidence Intervals",
       subtitle = "Comparing Trump16 Vote % by Education Group, ohio20 data")

Plotting Tukey HSD intervals

One-Way ANOVA Question 4

Let’s break Ohio’s counties into 4 groups based on their median income

ohio20 <- ohio20 |> 
  mutate(income = Hmisc::cut2(income, g = 4)) |>
  mutate(income = fct_recode(income, "Lowest" = "[40416, 48792)", 
          "Low" = "[48792, 53904)", "High" = "[53904, 60828)", 
          "Highest" = "[60828,103536]"))

Did Former President Trump’s vote percentage in 2016 vary meaningfully across income?

Trump 2016 % by Income

ggplot(ohio20, aes(x = income, y = trump16, fill = income)) +
  geom_violin(alpha = 0.25) +
  geom_boxplot(width = 0.25) +
  guides(fill = "none") + 
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  labs(x = "Income Group (2020 County Health Rankings)", 
       y = "Proportion of Vote for Trump in 2016 Election",
       title = "Proportion of Trump Vote by Income Group",
       subtitle = "Ohio's 88 counties")

Trump 2016 % by Income

Numerical Comparison

mosaic::favstats(trump16 ~ income, data = ohio20) |>
  rename(na = missing) |> kbl(digits = 2) |> kable_classic_2(font_size = 28)
income min Q1 median Q3 max mean sd n na
Lowest 38.32 64.72 68.94 71.41 76.23 64.71 11.18 22 0
Low 30.51 61.50 66.35 70.87 78.53 64.40 10.71 22 0
High 34.30 59.70 67.87 70.98 76.03 63.73 11.75 22 0
Highest 50.51 60.12 65.28 72.51 80.58 65.80 9.21 22 0

Analysis of Variance (ANOVA) testing

Does the mean trump16 result differ detectably across the income groups?

model_4 <- lm(trump16 ~ income, data = ohio20)

tidy(model_4, conf.int = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 28)
term estimate std.error conf.low conf.high p.value
(Intercept) 64.71 2.29 60.15 69.27 0.00
incomeLow -0.31 3.24 -6.75 6.14 0.93
incomeHigh -0.98 3.24 -7.42 5.47 0.76
incomeHighest 1.09 3.24 -5.36 7.54 0.74

ANOVA for the Linear Model

anova(model_4)
Analysis of Variance Table

Response: trump16
          Df Sum Sq Mean Sq F value Pr(>F)
income     3   48.8  16.272  0.1407 0.9354
Residuals 84 9717.8 115.688               
glance(model_4) |> 
  select(r.squared, statistic, df, df.residual, p.value)
# A tibble: 1 × 5
  r.squared statistic    df df.residual p.value
      <dbl>     <dbl> <dbl>       <int>   <dbl>
1   0.00500     0.141     3          84   0.935

So, what’s the conclusion?

Residual Plots for model_4

aug_4 <- augment(model_4, ohio20)

p1 <- ggplot(aug_4, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = F,
              lty = "dashed", col = "red") +
  geom_text_repel(data = aug_4 |> 
                    slice_max(abs(.resid), n = 3), 
                  aes(label = county)) +
  labs(title = "model_4 Residuals vs. Fitted",
       x = "Fitted Value from model_4",
       y = "Residuals from model_4")

p2 <- ggplot(aug_4, aes(sample = .resid)) +
  geom_qq() + geom_qq_line(col = "red") + 
  labs(title = "model_4 Residuals",
       y = "")

p3 <- ggplot(aug_4, aes(y = .resid, x = "")) +
  geom_violin(fill = "aquamarine") +
  geom_boxplot(width = 0.5) + 
  labs(y = "", x = "")

p1 + p2 + p3 + plot_layout(widths = c(5, 4, 1))

Residual Plots for model_4

Does Kruskal-Wallis give a different result?

kruskal.test(trump16 ~ income, data = ohio20)

    Kruskal-Wallis rank sum test

data:  trump16 by income
Kruskal-Wallis chi-squared = 0.35787, df = 3, p-value = 0.9488

Tukey HSD 90% CIs: Income Groups

tukey_4 <- tidy(TukeyHSD(aov(model_4), ordered = TRUE, conf.level = 0.90))
tukey_4 |> select(-null.value) |> 
  kbl(digits = 3) |> kable_classic_2(font_size = 28)
term contrast estimate conf.low conf.high adj.p.value
income Low-High 0.670 -6.878 8.217 0.997
income Lowest-High 0.975 -6.572 8.523 0.990
income Highest-High 2.063 -5.484 9.611 0.920
income Lowest-Low 0.306 -7.241 7.853 1.000
income Highest-Low 1.394 -6.154 8.941 0.973
income Highest-Lowest 1.088 -6.460 8.635 0.987

Plotting Tukey HSD intervals (Income Groups)

ggplot(tukey_4, aes(x = reorder(contrast, -estimate), 
                      y = estimate)) +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) + 
  geom_hline(yintercept = 0, col = "red", 
             linetype = "dashed") +
  geom_label(aes(label = round_half_up(estimate,2))) +
  coord_flip() +
  labs(x = "Contrast between Income Groups", 
       y = "Estimated Effect, with 90% Tukey HSD interval",
       title = "Estimated Effects, with Tukey HSD 90% Confidence Intervals",
       subtitle = "Comparing Trump16 Vote % by Income Group, ohio20 data")

Plotting Tukey HSD intervals (Income Groups)

K Samples: Comparing Means

  1. What is the outcome under study?
  2. What are the (in this case, \(K \geq 2\)) treatment/exposure groups?
  3. Were the data in fact collected using independent samples?
  4. Are the data random samples from the population(s) of interest? Or is there at least a reasonable argument for generalizing from the samples to the population(s)?
  5. What is the significance level (or, the confidence level) we require?
  6. Are we doing one-sided or two-sided testing? (usually 2-sided)
  7. What does the distribution of each individual sample tell us about which inferential procedure to use?
  8. Are there statistically detectable differences between population means?
  9. If an overall test rejects the null, can we identify pairwise comparisons of means that show detectable differences using an appropriate procedure that protects against Type I error expansion due to multiple comparisons?

New Two-Way ANOVA Example

Suppose we’re interested in the mean trump16 and how it might be influenced by both the income and the educ groups.

model_5noint <- lm(trump16 ~ income + educ, data = ohio20)
model_5int <- lm(trump16 ~ income * educ, data = ohio20)

anova(model_5int)
Analysis of Variance Table

Response: trump16
            Df Sum Sq Mean Sq F value    Pr(>F)    
income       3   48.8   16.27  0.3395  0.796813    
educ         4 5114.7 1278.68 26.6792 1.596e-13 ***
income:educ  9 1200.2  133.35  2.7824  0.007511 ** 
Residuals   71 3402.9   47.93                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction Plot is a plot of means

Calculate the group means.

out_summary5 <- ohio20 |>
  group_by(income, educ) |>
  summarise(mean_trump = mean(trump16, na.rm = TRUE))

out_summary5
# A tibble: 17 × 3
# Groups:   income [4]
   income  educ   mean_trump
   <fct>   <fct>       <dbl>
 1 Lowest  Least        67.9
 2 Lowest  Low          68.5
 3 Lowest  Middle       69.7
 4 Lowest  High         41.2
 5 Low     Least        73.0
 6 Low     Low          68.6
 7 Low     Middle       64.5
 8 Low     High         58.3
 9 Low     Most         39.2
10 High    Least        69.8
11 High    Low          69.1
12 High    Middle       70.9
13 High    High         60.7
14 High    Most         44.3
15 Highest Least        73.4
16 Highest High         68.3
17 Highest Most         61.9

Now, build the interaction plot.

Looking for a substantial interaction (non-parallel lines)

ggplot(out_summary5, aes(x = income, y = mean_trump)) +
  geom_line(aes(group = educ, color = educ)) +
  geom_point(aes(color = educ))

Now, build the interaction plot.

Two-Way ANOVA without Interaction

anova(model_5noint)
Analysis of Variance Table

Response: trump16
          Df Sum Sq Mean Sq F value    Pr(>F)    
income     3   48.8   16.27  0.2828    0.8377    
educ       4 5114.7 1278.68 22.2231 2.305e-12 ***
Residuals 80 4603.1   57.54                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two-Way ANOVA without Interaction

tidy(model_5noint, conf.int= TRUE, conf.level = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 16)
term estimate std.error conf.low conf.high p.value
(Intercept) 66.89 2.07 63.45 70.34 0.00
incomeLow 1.98 2.34 -1.91 5.87 0.40
incomeHigh 4.22 2.48 0.10 8.34 0.09
incomeHighest 15.99 2.79 11.35 20.63 0.00
educLow 0.00 2.59 -4.31 4.30 1.00
educMiddle -1.19 2.74 -5.75 3.38 0.67
educHigh -14.84 2.69 -19.33 -10.36 0.00
educMost -23.38 2.95 -28.28 -18.48 0.00

ANOVA setup for No Interaction model

anova(model_5noint)
Analysis of Variance Table

Response: trump16
          Df Sum Sq Mean Sq F value    Pr(>F)    
income     3   48.8   16.27  0.2828    0.8377    
educ       4 5114.7 1278.68 22.2231 2.305e-12 ***
Residuals 80 4603.1   57.54                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Proportion of total SS explained by the two factors?
    • (48.8 + 5114.7) / (48.8 + 5114.7 + 4603.1) = 5163.5 / 9766.6 = 0.529

Residual Plots?

par(mfrow = c(1,2)); plot(model_5noint, which = 1:2); par(mfrow = c(1,1))

Two-Way ANOVA with Interaction

anova(model_5int)
Analysis of Variance Table

Response: trump16
            Df Sum Sq Mean Sq F value    Pr(>F)    
income       3   48.8   16.27  0.3395  0.796813    
educ         4 5114.7 1278.68 26.6792 1.596e-13 ***
income:educ  9 1200.2  133.35  2.7824  0.007511 ** 
Residuals   71 3402.9   47.93                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Proportion of total SS explained by the interaction?
    • 1200.2 / (48.8 + 5114.7 + 1200.2 + 3402.9) = 1200.2 / 9766.6 = 0.123
  • Proportion of explained SS coming from interaction?
    • 1200.2 / (48.8 + 5114.7 + 1200.2) = 1200.2 / 6363.7 = 0.189

Two-Way ANOVA with Interaction

tidy(model_5int, conf.int= TRUE, conf.level = 0.90) |> 
  select(term, estimate, std.error, conf.low, conf.high, p.value) |> 
  kbl(digits = 2) |> kable_classic_2(font_size = 18)
term estimate std.error conf.low conf.high p.value
(Intercept) 67.93 2.31 64.08 71.77 0.00
incomeLow 5.07 3.86 -1.37 11.50 0.19
incomeHigh 1.84 7.30 -10.32 14.01 0.80
incomeHighest 5.45 4.62 -2.24 13.14 0.24
educLow 0.61 3.49 -5.20 6.43 0.86
educMiddle 1.74 4.62 -5.95 9.43 0.71
educHigh -26.77 4.62 -34.46 -19.08 0.00
educMost -11.48 4.51 -19.00 -3.97 0.01
incomeLow:educLow -5.00 5.45 -14.09 4.09 0.36
incomeHigh:educLow -1.26 8.35 -15.17 12.65 0.88
incomeHighest:educLow NA NA NA NA NA
incomeLow:educMiddle -10.25 6.23 -20.64 0.14 0.10
incomeHigh:educMiddle -0.66 8.67 -15.11 13.80 0.94
incomeHighest:educMiddle NA NA NA NA NA
incomeLow:educHigh 12.07 6.85 0.66 23.48 0.08
incomeHigh:educHigh 17.70 9.01 2.68 32.72 0.05
incomeHighest:educHigh 21.71 6.58 10.74 32.67 0.00
incomeLow:educMost -22.27 7.34 -34.50 -10.04 0.00
incomeHigh:educMost -14.01 8.96 -28.94 0.92 0.12
incomeHighest:educMost NA NA NA NA NA

Residual Plots (with interaction)?

par(mfrow = c(1,2)); plot(model_5int, which = 1:2); par(mfrow = c(1,1))

What conclusions should we draw?

  • The interaction term accounts for a relatively large percentage of the variation in our outcome, and for a large percentage (nearly 20%) of the variation we explain with our model.
  • The interaction plot suggests substantial interaction between the factors (lines are not at all parallel.)

So, I would prefer the “interaction” model.

Session Information

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3   forcats_1.0.0     stringr_1.5.0     purrr_1.0.2      
 [5] readr_2.1.4       tidyr_1.3.0       tibble_3.2.1      tidyverse_2.0.0  
 [9] patchwork_1.1.3   naniar_1.0.0      janitor_2.2.0     mosaic_1.8.4.2   
[13] mosaicData_0.20.3 ggformula_0.10.4  dplyr_1.1.3       Matrix_1.6-1.1   
[17] lattice_0.21-8    Hmisc_5.1-1       kableExtra_1.3.4  broom_1.0.5      
[21] ggdist_3.3.0      ggrepel_0.9.4     ggplot2_3.4.4     readxl_1.4.3     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0     viridisLite_0.4.2    farver_2.1.1        
 [4] fastmap_1.1.1        tweenr_2.0.2         labelled_2.12.0     
 [7] digest_0.6.33        rpart_4.1.19         timechange_0.2.0    
[10] lifecycle_1.0.3      cluster_2.1.4        ellipsis_0.3.2      
[13] magrittr_2.0.3       compiler_4.3.1       rlang_1.1.1         
[16] tools_4.3.1          utf8_1.2.3           yaml_2.3.7          
[19] data.table_1.14.8    knitr_1.44           labeling_0.4.3      
[22] htmlwidgets_1.6.2    ggstance_0.3.6       RColorBrewer_1.1-3  
[25] xml2_1.3.5           withr_2.5.1          foreign_0.8-84      
[28] nnet_7.3-19          grid_4.3.1           polyclip_1.10-6     
[31] mosaicCore_0.9.2.1   fansi_1.0.5          colorspace_2.1-0    
[34] scales_1.2.1         MASS_7.3-60          ggridges_0.5.4      
[37] cli_3.6.1            rmarkdown_2.25       generics_0.1.3      
[40] rstudioapi_0.15.0    tzdb_0.4.0           httr_1.4.7          
[43] ggforce_0.4.1        splines_4.3.1        rvest_1.0.3         
[46] cellranger_1.1.0     base64enc_0.1-3      vctrs_0.6.4         
[49] webshot_0.5.5        jsonlite_1.8.7       hms_1.1.3           
[52] visdat_0.6.0         Formula_1.2-5        htmlTable_2.4.1     
[55] systemfonts_1.0.5    glue_1.6.2           distributional_0.3.2
[58] stringi_1.7.12       gtable_0.3.4         quadprog_1.5-8      
[61] munsell_0.5.0        pillar_1.9.0         htmltools_0.5.6.1   
[64] R6_2.5.1             evaluate_0.22        highr_0.10          
[67] haven_2.5.3          backports_1.4.1      snakecase_0.11.1    
[70] Rcpp_1.0.11          nlme_3.1-162         svglite_2.1.2       
[73] gridExtra_2.3        checkmate_2.3.0      mgcv_1.8-42         
[76] xfun_0.40            pkgconfig_2.0.3